Summary Of Work

Summary of Work

SOURCES/DATA COLLECTING

The main goal for our project involved comparing player statistics of all the players in the 2022-2023 draft from their most recent college, international, or g-league season.We chose to pinpoint the players’ most recent season as we believed it is a good sample to how they might fare right before the draft. These stats were then used to eventually predict how we perceived these players to play in the upcoming NBA season. First, we split our data into individual player statistics which consisted of points per 40, plus/minus, win shares, effective field goal %, field goal %, usage rate, steals, defensive win shares, and games played. We then focused our data onto more position-based statistics. We saved all these data in separate CSV files. In this area of focus, we chose to focus on turnover rate, assists per game, and free throw percentage for point guards, as we felt these three particular statistics encapsulated the pivotal point guard stats. On the other hand, for our shooting guards and small forwards we measured their 3 point percentages as we believed these types of players generally shot the three point shot at a much higher volume than any other positions. Finally, for power forwards and centers we measured their blocks and rebounds per game, as we determined these stats to be the main points of focus for these particular positions. In addition, we used players that played at least 10 games for their respective teams. However, this also meant we didn’t use Shaedon Sharpe for our analysis since he didn’t play a single game for Kentucky. We collected our data for college stats using (https://www.sports-reference.com/cbb/), for international players we used RealGM (https://basketball.realgm.com/nba), and finally for g-league prospects we used the official g-league website (https://stats.gleague.nba.com/player/1630699/usage/). The data we utilized was manually inputted into an Excel spreadsheet, which was then converted into individual CSV files for the purpose of coding.

Statistical Trends

Studying statistical trends can be a useful technique to predict certain models/events over time. For our project, we specifically collected data from the last six NBA drafts (2016-2021) and measured individual points per 40 minutes stats for each player. Our data incorporates all the players from these six drafts whether they originally played in college, overseas, or through the G-league. We ultimately chose this statistic, as we believe it portrays a fair representation of each player regardless of their playing time and role for their team. We chose to portray the trends through a histogram, as we wanted to see the overall mean and outliers for an accurate representation of the data.

The results from this histogram ultimately showcase that this year's draft class (2022) was pretty average compared to previous years. As a result, we can realistically conclude that this draft should result in the trend of having few all stars, a decent number of starters, and a handful of role players or lower end bench players.

Player Efficiency in Kmeans Clustering

Another model we used for his project was K Means Clustering. K means clustering is a popular machine learning model that separates each data point into clusters. K means clustering is also an unsupervised clustering algorithm which means it can be used without a response variable. For our clusters we measured the efficiency of each player. Efficiency measures all of the basic NBA stats (points, rebounds, assists, steals, turnovers, blocks and shot attempts) and is compiled into one big rating. It is also one of the most common statistics used in the NBA. A good efficiency rate in the NBA is around 27.5 and above.

I used the elbow method to find the appropriate number of clusters that would fit our data. The elbow method is a helpful tool that tells us the appropriate number of clusters that we need. Usually the proper number of clusters is identified by when the graph starts to slope.

As you can see the graph starts to slope a little when x is equal to 3. As a result I chose 3 to be the number of clusters.

I separated the graph into 3 clusters and the results looked like this. Cluster 2 shows to have the players with the highest efficiency rate overall. As expected these players include Keggan Murray, Chet Holmgren, and number 1 overall pick, Paolo Banchero. A little surprise was the 41st pick, EJ Lindell who was in the cluster with the highest rating. His rating was measured at 30.5. Another surprise was number the 8th and 9th overall picks (Dyson Daniels and Jeremy Sochan) in cluster 1.